application to case study
testing concept vectors
relationship with sufficiency
user study evaluation
Alain, G., & Bengio, Y. (2017). Understanding intermediate layers using linear classifier probes. OpenReview. https://openreview.net/forum?id=ryF7rTqgl
Schmalwasser, L., Penzel, N., Denzler, J., & Niebling, J. (2025). FastCAV: Efficient computation of concept activation vectors for explaining deep neural networks. In Proceedings of the 42nd International Conference on Machine Learning. https://openreview.net/forum?id=kRmfzTfIGe
In PyTorch, use register_forward_hook to capture \(h_l(x)\) during forward pass.
High-dimensional spaces (\(d=2048\)): almost any two 50-image sets are separable. Random directions may align with gradients by chance.
Need to calibrate. Is TCAV=0.6 high or just noise?
Step 4. Let \(N_k\) be the number of images in class \(k\). The TCAV Score (\(T_k\)) is:
\[T_k = \frac{1}{N_k}\left|\{x \in \text{Class } k : S_k(x) > 0\}\right|\]
Concept: “stripes”
Step 5. Test significance by learning reference \(\vec{v}\) trained on random image sets.
Concept: “stripes”
We need three things.
layer_name \(\to\) tensor(N, dim)from captum.concept import TCAV, Concept
from captum.attr import LayerIntegratedGradients
# 1. load model and specify layers
model = torchvision.models.googlenet(pretrained=True).eval()
layers = ['inception4c', 'inception4d', 'inception4e']
# 2. folders contain example images
stripes = Concept(id=0, name="striped", data_iter=load_concept("striped/"))
random = Concept(id=1, name="random", data_iter=load_concept("random/"))from fastcav import FastCAVCaptumClassifier
tcav = TCAV(model=model, layers=layers,
classifier=FastCAVCaptumClassifier(), # <-- only change
layer_attr_method=LayerIntegratedGradients(model, None))
# Same workflow, identical results, faster
scores = tcav.interpret(inputs=zebra_images, experimental_sets=[[stripes, random]], target=340)TCAV Score. Nested dictionary structure
tcav_scores = {
'0-3': { # concept pair IDs
'inception4c': {
'sign_count': tensor([0.98, 0.02]), # fraction positive
'magnitude': tensor([1.97, -1.97]) # average sensitivity
...
}sign_count. Fraction of inputs where concept increases prediction (TCAV score proper)
magnitude. Average directional sensitivity across inputs
sign_count[0] = 0.98 means “stripes” positively influences 98% of zebra predictions